Overview

Dataset statistics

Number of variables22
Number of observations39759
Missing cells15903
Missing cells (%)1.8%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory6.7 MiB
Average record size in memory176.0 B

Variable types

NUM17
BOOL3
CAT2

Reproduction

Analysis started2020-06-07 11:36:14.778501
Analysis finished2020-06-07 11:37:12.522602
Duration57.74 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

QUARTER_OF_YEAR is highly correlated with MONTHHigh correlation
MONTH is highly correlated with QUARTER_OF_YEARHigh correlation
MULTIPLE_OFFENSE has 15903 (40.0%) missing values Missing
X_10 is highly skewed (γ1 = 30.92348051) Skewed
X_12 is highly skewed (γ1 = 26.64404103) Skewed
INCIDENT_ID has unique values Unique
X_4 has 5588 (14.1%) zeros Zeros
X_5 has 7908 (19.9%) zeros Zeros
X_7 has 5794 (14.6%) zeros Zeros
X_8 has 14634 (36.8%) zeros Zeros
X_11 has 4268 (10.7%) zeros Zeros
X_12 has 8517 (21.4%) zeros Zeros
X_14 has 458 (1.2%) zeros Zeros
X_15 has 1680 (4.2%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

Distinct count23856
Unique (%)60.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10336.960009054554
Minimum0
Maximum23855
Zeros2
Zeros (%)< 0.1%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile993.9
Q14969.5
median9939
Q314909
95-th percentile21867.1
Maximum23855
Range23855
Interquartile range (IQR)9939.5

Descriptive statistics

Standard deviation6378.244484
Coefficient of variation (CV)0.617032907
Kurtosis-0.893746517
Mean10336.96001
Median Absolute Deviation (MAD)4970
Skewness0.2791311245
Sum410987193
Variance40682002.69
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20472< 0.1%
 
56462< 0.1%
 
97682< 0.1%
 
118172< 0.1%
 
138662< 0.1%
 
15802< 0.1%
 
36292< 0.1%
 
56782< 0.1%
 
77272< 0.1%
 
98002< 0.1%
 
Other values (23846)3973999.9%
 
ValueCountFrequency (%) 
02< 0.1%
 
12< 0.1%
 
22< 0.1%
 
32< 0.1%
 
42< 0.1%
 
ValueCountFrequency (%) 
238551< 0.1%
 
238541< 0.1%
 
238531< 0.1%
 
238521< 0.1%
 
238511< 0.1%
 

INCIDENT_ID
Categorical

UNIQUE

Distinct count39759
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size310.6 KiB
CR_77927
 
1
CR_152304
 
1
CR_10505
 
1
CR_128864
 
1
CR_28045
 
1
Other values (39754)
39754
ValueCountFrequency (%) 
CR_779271< 0.1%
 
CR_1523041< 0.1%
 
CR_105051< 0.1%
 
CR_1288641< 0.1%
 
CR_280451< 0.1%
 
CR_26021< 0.1%
 
CR_1496891< 0.1%
 
CR_1122501< 0.1%
 
CR_257991< 0.1%
 
CR_1529391< 0.1%
 
Other values (39749)39749> 99.9%
 

Length

Max length9
Median length8
Mean length8.444201313
Min length4

X_2
Real number (ℝ≥0)

Distinct count52
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.763776754948566
Minimum0
Maximum52
Zeros40
Zeros (%)0.1%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile4
Q17
median24
Q336
95-th percentile48
Maximum52
Range52
Interquartile range (IQR)29

Descriptive statistics

Standard deviation15.23552157
Coefficient of variation (CV)0.6152341673
Kurtosis-1.307501292
Mean24.76377675
Median Absolute Deviation (MAD)13
Skewness-0.09338549112
Sum984583
Variance232.1211175
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4672416.9%
 
3636579.2%
 
3335739.0%
 
2422575.7%
 
2120885.3%
 
3716064.0%
 
4515453.9%
 
4914863.7%
 
313073.3%
 
2210912.7%
 
Other values (42)1442536.3%
 
ValueCountFrequency (%) 
0400.1%
 
1330.1%
 
21940.5%
 
313073.3%
 
4672416.9%
 
ValueCountFrequency (%) 
52250.1%
 
511620.4%
 
502790.7%
 
4914863.7%
 
48980.2%
 

X_4
Real number (ℝ≥0)

ZEROS

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.279735405820066
Minimum0
Maximum10
Zeros5588
Zeros (%)14.1%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q36
95-th percentile10
Maximum10
Range10
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.956637769
Coefficient of variation (CV)0.6908459259
Kurtosis-1.018315231
Mean4.279735406
Median Absolute Deviation (MAD)2
Skewness0.1871304045
Sum170158
Variance8.741706897
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6907822.8%
 
2788319.8%
 
0558814.1%
 
7478112.0%
 
433698.5%
 
331607.9%
 
923205.8%
 
1021135.3%
 
114613.7%
 
56< 0.1%
 
ValueCountFrequency (%) 
0558814.1%
 
114613.7%
 
2788319.8%
 
331607.9%
 
433698.5%
 
ValueCountFrequency (%) 
1021135.3%
 
923205.8%
 
7478112.0%
 
6907822.8%
 
56< 0.1%
 

X_5
Real number (ℝ≥0)

ZEROS

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.4527528358359114
Minimum0
Maximum5
Zeros7908
Zeros (%)19.9%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q35
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.96318419
Coefficient of variation (CV)0.8004003343
Kurtosis-1.556820375
Mean2.452752836
Median Absolute Deviation (MAD)2
Skewness0.1743576897
Sum97519
Variance3.854092163
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
51223830.8%
 
11125228.3%
 
3835521.0%
 
0790819.9%
 
26< 0.1%
 
ValueCountFrequency (%) 
0790819.9%
 
11125228.3%
 
26< 0.1%
 
3835521.0%
 
51223830.8%
 
ValueCountFrequency (%) 
51223830.8%
 
3835521.0%
 
26< 0.1%
 
11125228.3%
 
0790819.9%
 

X_6
Real number (ℝ≥0)

Distinct count19
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.126461933147212
Minimum1
Maximum19
Zeros0
Zeros (%)0.0%
Memory size310.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median5
Q38
95-th percentile15
Maximum19
Range18
Interquartile range (IQR)5

Descriptive statistics

Standard deviation4.463585046
Coefficient of variation (CV)0.7285746806
Kurtosis0.06079304921
Mean6.126461933
Median Absolute Deviation (MAD)3
Skewness0.9704193921
Sum243582
Variance19.92359146
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1579414.6%
 
5447611.3%
 
6439011.0%
 
438699.7%
 
238639.7%
 
1538229.6%
 
737289.4%
 
329097.3%
 
823565.9%
 
920985.3%
 
Other values (9)24546.2%
 
ValueCountFrequency (%) 
1579414.6%
 
238639.7%
 
329097.3%
 
438699.7%
 
5447611.3%
 
ValueCountFrequency (%) 
195< 0.1%
 
182640.7%
 
171830.5%
 
1610262.6%
 
1538229.6%
 

X_7
Real number (ℝ≥0)

ZEROS

Distinct count19
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.870947458437083
Minimum0
Maximum18
Zeros5794
Zeros (%)14.6%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q37
95-th percentile12
Maximum18
Range18
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.870959307
Coefficient of variation (CV)0.7947035644
Kurtosis0.5203116861
Mean4.870947458
Median Absolute Deviation (MAD)3
Skewness0.7988064995
Sum193664
Variance14.98432596
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0579414.6%
 
6447611.3%
 
4439011.0%
 
238699.7%
 
738639.7%
 
1038229.6%
 
137289.4%
 
529097.3%
 
323565.9%
 
820985.3%
 
Other values (9)24546.2%
 
ValueCountFrequency (%) 
0579414.6%
 
137289.4%
 
238699.7%
 
323565.9%
 
4439011.0%
 
ValueCountFrequency (%) 
182400.6%
 
173270.8%
 
163390.9%
 
15390.1%
 
14310.1%
 

X_8
Real number (ℝ≥0)

ZEROS

Distinct count27
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9781684650016349
Minimum0
Maximum99
Zeros14634
Zeros (%)36.8%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q31
95-th percentile3
Maximum99
Range99
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.46042113
Coefficient of variation (CV)1.49301596
Kurtosis652.7401544
Mean0.978168465
Median Absolute Deviation (MAD)1
Skewness14.4255057
Sum38891
Variance2.132829876
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
11832946.1%
 
01463436.8%
 
237729.5%
 
315924.0%
 
46731.7%
 
53500.9%
 
61520.4%
 
7610.2%
 
8540.1%
 
10410.1%
 
Other values (17)1010.3%
 
ValueCountFrequency (%) 
01463436.8%
 
11832946.1%
 
237729.5%
 
315924.0%
 
46731.7%
 
ValueCountFrequency (%) 
991< 0.1%
 
503< 0.1%
 
401< 0.1%
 
302< 0.1%
 
291< 0.1%
 

X_9
Real number (ℝ≥0)

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.917980834528032
Minimum0
Maximum6
Zeros200
Zeros (%)0.5%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile2
Q15
median5
Q36
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.367461734
Coefficient of variation (CV)0.2780534899
Kurtosis1.252125374
Mean4.917980835
Median Absolute Deviation (MAD)1
Skewness-1.517828575
Sum195534
Variance1.869951595
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
51761044.3%
 
61578139.7%
 
2509112.8%
 
37621.9%
 
13100.8%
 
02000.5%
 
45< 0.1%
 
ValueCountFrequency (%) 
02000.5%
 
13100.8%
 
2509112.8%
 
37621.9%
 
45< 0.1%
 
ValueCountFrequency (%) 
61578139.7%
 
51761044.3%
 
45< 0.1%
 
37621.9%
 
2509112.8%
 

X_10
Real number (ℝ≥0)

SKEWED

Distinct count26
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.243366281848135
Minimum1
Maximum90
Zeros0
Zeros (%)0.0%
Memory size310.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum90
Range89
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.017419435
Coefficient of variation (CV)0.8182781294
Kurtosis2000.81086
Mean1.243366282
Median Absolute Deviation (MAD)0
Skewness30.92348051
Sum49435
Variance1.035142307
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
13361884.6%
 
2453211.4%
 
39242.3%
 
43640.9%
 
51140.3%
 
6920.2%
 
8250.1%
 
10250.1%
 
7230.1%
 
911< 0.1%
 
Other values (16)310.1%
 
ValueCountFrequency (%) 
13361884.6%
 
2453211.4%
 
39242.3%
 
43640.9%
 
51140.3%
 
ValueCountFrequency (%) 
901< 0.1%
 
581< 0.1%
 
501< 0.1%
 
402< 0.1%
 
301< 0.1%
 

X_11
Real number (ℝ≥0)

ZEROS

Distinct count150
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean206.95434995849996
Minimum0
Maximum332
Zeros4268
Zeros (%)10.7%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1174
median249
Q3249
95-th percentile316
Maximum332
Range332
Interquartile range (IQR)75

Descriptive statistics

Standard deviation93.0619573
Coefficient of variation (CV)0.4496738403
Kurtosis0.192539772
Mean206.95435
Median Absolute Deviation (MAD)67
Skewness-0.9031502716
Sum8228298
Variance8660.527897
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1741210030.4%
 
2491155229.1%
 
316757719.1%
 
0426810.7%
 
3037071.8%
 
1275191.3%
 
1793570.9%
 
743340.8%
 
1022080.5%
 
2631760.4%
 
Other values (140)19614.9%
 
ValueCountFrequency (%) 
0426810.7%
 
13< 0.1%
 
63< 0.1%
 
117< 0.1%
 
121< 0.1%
 
ValueCountFrequency (%) 
3324< 0.1%
 
330390.1%
 
329310.1%
 
3281200.3%
 
3272< 0.1%
 

X_12
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count24
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9735405820065897
Minimum0.0
Maximum90.0
Zeros8517
Zeros (%)21.4%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile2
Maximum90
Range90
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.056816506
Coefficient of variation (CV)1.085539242
Kurtosis1723.772266
Mean0.973540582
Median Absolute Deviation (MAD)0
Skewness26.64404103
Sum38707
Variance1.116861127
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
12651366.7%
 
0851721.4%
 
234208.6%
 
37972.0%
 
42760.7%
 
51010.3%
 
6590.1%
 
818< 0.1%
 
714< 0.1%
 
1011< 0.1%
 
Other values (14)330.1%
 
ValueCountFrequency (%) 
0851721.4%
 
12651366.7%
 
234208.6%
 
37972.0%
 
42760.7%
 
ValueCountFrequency (%) 
901< 0.1%
 
581< 0.1%
 
501< 0.1%
 
402< 0.1%
 
301< 0.1%
 

X_13
Real number (ℝ≥0)

Distinct count68
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean85.21886868382002
Minimum0
Maximum117
Zeros2
Zeros (%)< 0.1%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile18
Q172
median98
Q3103
95-th percentile112
Maximum117
Range117
Interquartile range (IQR)31

Descriptive statistics

Standard deviation27.55532481
Coefficient of variation (CV)0.3233476956
Kurtosis1.1341156
Mean85.21886868
Median Absolute Deviation (MAD)11
Skewness-1.398063774
Sum3388217
Variance759.2959255
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1031177529.6%
 
72761219.1%
 
92535313.5%
 
11234688.7%
 
9823075.8%
 
1813993.5%
 
248862.2%
 
1098482.1%
 
127021.8%
 
595601.4%
 
Other values (58)484912.2%
 
ValueCountFrequency (%) 
02< 0.1%
 
18< 0.1%
 
23821.0%
 
72< 0.1%
 
83< 0.1%
 
ValueCountFrequency (%) 
1171< 0.1%
 
1164661.2%
 
115310.1%
 
114200.1%
 
1133670.9%
 

X_14
Real number (ℝ≥0)

ZEROS

Distinct count69
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.49201438667974
Minimum0
Maximum142
Zeros458
Zeros (%)1.2%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile25
Q129
median62
Q3107
95-th percentile142
Maximum142
Range142
Interquartile range (IQR)78

Descriptive statistics

Standard deviation43.35376456
Coefficient of variation (CV)0.5980488323
Kurtosis-1.324487842
Mean72.49201439
Median Absolute Deviation (MAD)33
Skewness0.2532434153
Sum2882210
Variance1879.548901
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
291365934.4%
 
93514012.9%
 
142455711.5%
 
62407010.2%
 
8025296.4%
 
13019765.0%
 
10712343.1%
 
1411582.9%
 
1199432.4%
 
1038422.1%
 
Other values (59)36519.2%
 
ValueCountFrequency (%) 
04581.2%
 
21< 0.1%
 
62130.5%
 
101< 0.1%
 
122< 0.1%
 
ValueCountFrequency (%) 
142455711.5%
 
1401080.3%
 
13913< 0.1%
 
1382270.6%
 
1361010.3%
 

X_15
Real number (ℝ≥0)

ZEROS

Distinct count36
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.44789858899872
Minimum0
Maximum50
Zeros1680
Zeros (%)4.2%
Memory size310.6 KiB

Quantile statistics

Minimum0
5-th percentile23
Q134
median34
Q334
95-th percentile46
Maximum50
Range50
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8.357811091
Coefficient of variation (CV)0.2498755211
Kurtosis8.811395375
Mean33.44789859
Median Absolute Deviation (MAD)0
Skewness-2.54436585
Sum1329855
Variance69.85300624
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
343164679.6%
 
4325046.3%
 
016804.2%
 
4610792.7%
 
2310632.7%
 
488642.2%
 
363070.8%
 
502170.5%
 
91700.4%
 
39820.2%
 
Other values (26)1470.4%
 
ValueCountFrequency (%) 
016804.2%
 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
44< 0.1%
 
ValueCountFrequency (%) 
502170.5%
 
488642.2%
 
471< 0.1%
 
4610792.7%
 
4325046.3%
 

MULTIPLE_OFFENSE
Boolean

MISSING

Distinct count2
Unique (%)< 0.1%
Missing15903
Missing (%)40.0%
Memory size310.6 KiB
1
22788
0
 
1068
(Missing)
15903
ValueCountFrequency (%) 
12278857.3%
 
010682.7%
 
(Missing)1590340.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size310.6 KiB
0
23856
1
15903
ValueCountFrequency (%) 
02385660.0%
 
11590340.0%
 

YEAR
Real number (ℝ≥0)

Distinct count28
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2004.2260871752308
Minimum1991
Maximum2018
Zeros0
Zeros (%)0.0%
Memory size310.6 KiB

Quantile statistics

Minimum1991
5-th percentile1992
Q11998
median2004
Q32011
95-th percentile2017
Maximum2018
Range27
Interquartile range (IQR)13

Descriptive statistics

Standard deviation7.788343872
Coefficient of variation (CV)0.003885960732
Kurtosis-1.114256375
Mean2004.226087
Median Absolute Deviation (MAD)7
Skewness0.110890986
Sum79686025
Variance60.65830028
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
200119244.8%
 
199617524.4%
 
200016854.2%
 
199716174.1%
 
200816024.0%
 
200615613.9%
 
200715513.9%
 
199815363.9%
 
199315323.9%
 
200415213.8%
 
Other values (18)2347859.1%
 
ValueCountFrequency (%) 
19918792.2%
 
199213033.3%
 
199315323.9%
 
199411923.0%
 
199514903.7%
 
ValueCountFrequency (%) 
201813673.4%
 
201714703.7%
 
201612143.1%
 
201511823.0%
 
201411542.9%
 

MONTH
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count12
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.525591689931839
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size310.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median7
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.289973788
Coefficient of variation (CV)0.5041648243
Kurtosis-1.138800574
Mean6.52559169
Median Absolute Deviation (MAD)3
Skewness-0.03173595572
Sum259451
Variance10.82392752
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
937709.5%
 
1036989.3%
 
735959.0%
 
535669.0%
 
435248.9%
 
835018.8%
 
634978.8%
 
333858.5%
 
1130637.7%
 
228537.2%
 
Other values (2)530713.3%
 
ValueCountFrequency (%) 
127987.0%
 
228537.2%
 
333858.5%
 
435248.9%
 
535669.0%
 
ValueCountFrequency (%) 
1225096.3%
 
1130637.7%
 
1036989.3%
 
937709.5%
 
835018.8%
 

DAY
Real number (ℝ≥0)

Distinct count31
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.543776251917805
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Memory size310.6 KiB

Quantile statistics

Minimum1
5-th percentile2
Q18
median15
Q323
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.793849914
Coefficient of variation (CV)0.5657473301
Kurtosis-1.175693136
Mean15.54377625
Median Absolute Deviation (MAD)8
Skewness0.01609533896
Sum618005
Variance77.3317963
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
116114.1%
 
1514183.6%
 
713813.5%
 
1213583.4%
 
1313563.4%
 
2013513.4%
 
213473.4%
 
913393.4%
 
1413383.4%
 
1813293.3%
 
Other values (21)2593165.2%
 
ValueCountFrequency (%) 
116114.1%
 
213473.4%
 
312593.2%
 
412413.1%
 
512573.2%
 
ValueCountFrequency (%) 
316921.7%
 
3011813.0%
 
2911833.0%
 
2812483.1%
 
2712703.2%
 

IS_WEEKEND
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size310.6 KiB
0
28702
1
11057
ValueCountFrequency (%) 
02870272.2%
 
11105727.8%
 

QUARTER_OF_YEAR
Categorical

HIGH CORRELATION

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size310.6 KiB
3
10866
2
10587
4
9270
1
9036
ValueCountFrequency (%) 
31086627.3%
 
21058726.6%
 
4927023.3%
 
1903622.7%
 

Length

Max length1
Median length1
Mean length1
Min length1

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexINCIDENT_IDX_2X_4X_5X_6X_7X_8X_9X_10X_11X_12X_13X_14X_15MULTIPLE_OFFENSEIS_TEST_DATAYEARMONTHDAYIS_WEEKENDQUARTER_OF_YEAR
00CR_1026593621561611741.09229360.0020047413
11CR_189752370011171612361.0103142341.00201771803
22CR_184637335102311741.011093341.00201731501
33CR_1390713321711612491.07229341.00200921301
44CR_1093353321830511740.011229431.00200541302
55CR_9626345103101613031.07262341.0020034702
66CR_1314003073710511740.011229431.00200812201
77CR_11981873980513161.07262341.00199351402
88CR_1841344965831113161.010314341.00201682113
99CR_3263446515100521451.010329340.00199682513

Last rows

df_indexINCIDENT_IDX_2X_4X_5X_6X_7X_8X_9X_10X_11X_12X_13X_14X_15MULTIPLE_OFFENSEIS_TEST_DATAYEARMONTHDAYIS_WEEKENDQUARTER_OF_YEAR
3974915893CR_148375335100522492.01038034NaN12011102214
3975015894CR_677362141640511741.0989334NaN1200062902
3975115895CR_185890535831612491.0722934NaN1201762902
3975215896CR_898683351016101.0722934NaN1200351112
3975315897CR_148343335103613031.0722934NaN120119103
3975415898CR_44468227315100511740.0722943NaN11997112804
3975515899CR_15846035351023202.0729334NaN120126912
3975615900CR_11594626906426101.0726234NaN1200642212
3975715901CR_1376632141271622492.0926234NaN120094302
3975815902CR_33545465425612491.0722934NaN1199642402